16 research outputs found

    The NLMS algorithm with time-variant optimum stepsize derived from a Bayesian network perspective

    Full text link
    In this article, we derive a new stepsize adaptation for the normalized least mean square algorithm (NLMS) by describing the task of linear acoustic echo cancellation from a Bayesian network perspective. Similar to the well-known Kalman filter equations, we model the acoustic wave propagation from the loudspeaker to the microphone by a latent state vector and define a linear observation equation (to model the relation between the state vector and the observation) as well as a linear process equation (to model the temporal progress of the state vector). Based on additional assumptions on the statistics of the random variables in observation and process equation, we apply the expectation-maximization (EM) algorithm to derive an NLMS-like filter adaptation. By exploiting the conditional independence rules for Bayesian networks, we reveal that the resulting EM-NLMS algorithm has a stepsize update equivalent to the optimal-stepsize calculation proposed by Yamamoto and Kitayama in 1982, which has been adopted in many textbooks. As main difference, the instantaneous stepsize value is estimated in the M step of the EM algorithm (instead of being approximated by artificially extending the acoustic echo path). The EM-NLMS algorithm is experimentally verified for synthesized scenarios with both, white noise and male speech as input signal.Comment: 4 pages, 1 page of reference

    A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition

    Full text link
    This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules leading to a unified view on known derivations as well as to new formulations for certain approaches. The generic Bayesian perspective provided in this contribution thus highlights structural differences and similarities between the analyzed approaches

    Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

    Full text link
    We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.Comment: accepted for ICASSP201

    Working Memory and Response Inhibition as One Integral Phenotype of Adult ADHD? A Behavioral and Imaging Correlational Investigation

    Get PDF
    Objective: It is an open question whether working memory (WM) and response inhibition (RI) constitute one integral phenotype in attention deficit hyperactivity disorder (ADHD). Method: The authors investigated 45 adult ADHD patients and 41 controls comparable for age, gender, intelligence, and education during a letter n-back and a stop-signal task, and measured prefrontal oxygenation by means of functional near-infrared spectroscopy. Results: The authors replicated behavioral and cortical activation deficits in patients compared with controls for both tasks and also for performance in both control conditions. In the patient group, 2-back performance was correlated with stop-signal reaction time. This correlation did not seem to be specific for WM and RI as 1-back performance was correlated with go reaction time. No significant correlations of prefrontal oxygenation between WM and RI were found. Conclusion: The authors' findings do not support the hypothesis of WM and RI representing one integral phenotype of ADHD mediated by the prefrontal cortex

    Pneumothorax detection in chest radiographs: optimizing artificial intelligence system for accuracy and confounding bias reduction using in-image annotations in algorithm training

    Get PDF
    OBJECTIVES Diagnostic accuracy of artificial intelligence (AI) pneumothorax (PTX) detection in chest radiographs (CXR) is limited by the noisy annotation quality of public training data and confounding thoracic tubes (TT). We hypothesize that in-image annotations of the dehiscent visceral pleura for algorithm training boosts algorithm's performance and suppresses confounders. METHODS Our single-center evaluation cohort of 3062 supine CXRs includes 760 PTX-positive cases with radiological annotations of PTX size and inserted TTs. Three step-by-step improved algorithms (differing in algorithm architecture, training data from public datasets/clinical sites, and in-image annotations included in algorithm training) were characterized by area under the receiver operating characteristics (AUROC) in detailed subgroup analyses and referenced to the well-established \textquotedblCheXNet\textquotedbl algorithm. RESULTS Performances of established algorithms exclusively trained on publicly available data without in-image annotations are limited to AUROCs of 0.778 and strongly biased towards TTs that can completely eliminate algorithm's discriminative power in individual subgroups. Contrarily, our final \textquotedblalgorithm 2\textquotedbl which was trained on a lower number of images but additionally with in-image annotations of the dehiscent pleura achieved an overall AUROC of 0.877 for unilateral PTX detection with a significantly reduced TT-related confounding bias. CONCLUSIONS We demonstrated strong limitations of an established PTX-detecting AI algorithm that can be significantly reduced by designing an AI system capable of learning to both classify and localize PTX. Our results are aimed at drawing attention to the necessity of high-quality in-image localization in training data to reduce the risks of unintentionally biasing the training process of pathology-detecting AI algorithms. KEY POINTS • Established pneumothorax-detecting artificial intelligence algorithms trained on public training data are strongly limited and biased by confounding thoracic tubes. • We used high-quality in-image annotated training data to effectively boost algorithm performance and suppress the impact of confounding thoracic tubes. • Based on our results, we hypothesize that even hidden confounders might be effectively addressed by in-image annotations of pathology-related image features

    Working Memory and Response Inhibition as One Integral Phenotype of Adult ADHD? A Behavioral and Imaging Correlational Investigation

    Get PDF
    Objective: It is an open question whether working memory (WM) and response inhibition (RI) constitute one integral phenotype in attention deficit hyperactivity disorder (ADHD). Method: The authors investigated 45 adult ADHD patients and 41 controls comparable for age, gender, intelligence, and education during a letter n-back and a stop-signal task, and measured prefrontal oxygenation by means of functional near-infrared spectroscopy. Results: The authors replicated behavioral and cortical activation deficits in patients compared with controls for both tasks and also for performance in both control conditions. In the patient group, 2-back performance was correlated with stop-signal reaction time. This correlation did not seem to be specific for WM and RI as 1-back performance was correlated with go reaction time. No significant correlations of prefrontal oxygenation between WM and RI were found. Conclusion: The authors' findings do not support the hypothesis of WM and RI representing one integral phenotype of ADHD mediated by the prefrontal cortex
    corecore